Systems and methods for determining aneuploidy risk using sample fetal fraction

ABSTRACT

Disclosed herein are system, method, and computer program product embodiments for determining aneuploidy risk in a target sample of maternal blood or plasma based on the amount of fetal DNA. An embodiment operates by receiving known genetic data from known prenatal testing samples and genetic data for the target sample. A fetal fraction distribution is determined for the known genetic data based on gestational age and the maternal weight associated with the target sample. A model is then generated based on a fixed ratio reduction of the determined fetal fraction distribution. A fetal fraction based data likelihood for the target sample is then determined for each of the plurality of ploidy states using the generated model. An aneuploidy risk score is then outputted based on applying a Bayesian probability determination that combines each fetal fraction based data likelihood with a previously determined risk score as a conditional value.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application No. 62/182,085, filed Jun. 19, 2015, which is hereby incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

The present invention generally relates to molecular biology methods and systems, and more specifically to methods and systems for determining aneuploidy risk in a target maternal blood sample.

BACKGROUND

Noninvasive prenatal testing using cell-free DNA (cfDNA) can be used to detect abnormalities in a fetus. As a result, noninvasive prenatal testing is rapidly becoming part of clinical care for pregnant women.

Noninvasive prenatal testing is used to determine the genetic state of a fetus from genetic material that is obtained in a noninvasive manner, for example from a blood draw on the pregnant mother. The blood could be separated and the plasma isolated, and size selection can optionally be used to isolate the DNA of the appropriate length. This isolated DNA can then be measured by a number of means, such as by hybridizing to a genotyping array and measuring the fluorescence, or by sequencing on a high throughput sequencer.

Single Nucleotide Polymorphism (SNP) based noninvasive prenatal testing is one type of noninvasive prenatal test. SNP based noninvasive prenatal testing is often used to screen for fetal aneuploidies. But the accuracy of SNP based tests is dependent on the amount of fetal DNA present in a maternal blood or plasma sample. SNP based testing returns no call results when the amount of fetal DNA is not sufficient to provide the desired accuracy.

Low amounts of fetal DNA may be caused by a number of factors. One common factor is maternal weight. For example, as maternal weight increases, the amount of fetal DNA in maternal blood plasma, or other fluids often decreases. Thus, SNP based noninvasive prenatal tests that screen for fetal aneuploidies are sometimes unavailable to pregnant women.

BRIEF DESCRIPTION OF THE DRAWINGS

The presently disclosed embodiments will be further explained with reference to the attached drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.

FIG. 1 illustrates a fetal fraction distribution, according to an example embodiment.

FIG. 2 illustrates a log normal fetal fraction distribution, according to an example embodiment.

FIG. 3A illustrates a generated model for 19 weeks gestational age, according to an example embodiment.

FIG. 3B illustrates a generated model for 13 weeks gestational age, according to an example embodiment.

FIG. 4 is a flow chart of a method according to one embodiment of the invention.

FIG. 5 illustrates an example computer system for performing embodiments of the present invention.

FIG. 6 illustrates an example system for performing embodiments of the present invention.

FIG. 7 illustrates a posterior fetal fraction risk distribution, according to an example embodiment.

FIG. 8 illustrates a result set for an example embodiment for fetal fraction-based high risk assessment that predicts an aneuploidy in cases with low fetal fraction.

FIG. 9 illustrates a redraw success rate distribution, according to an example embodiment.

FIG. 10 illustrates a distribution of fetal fraction based risk scores in cases identified as high risk and low fetal fraction, according to an example embodiment.

FIG. 11A illustrates an estimated detection rate for trisomy 13 and 18, according to an example embodiment.

FIG. 11B illustrates an estimated detection rate for digynic triploidy, according to an example embodiment.

FIG. 12 illustrates a probability density function (PDF) of normalized euploid data, according to an example embodiment.

FIG. 13 illustrates a cumulative distribution function (CDF) of normalized euploid data, according to an example embodiment.

FIG. 14 illustrates a plot of redraw success rate, according to an example embodiment.

FIG. 15 illustrates a result set for identified high risk samples, according to an example embodiment.

While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.

DETAILED DESCRIPTION

Provided herein are system, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for determining aneuploidy risk in a target sample of maternal blood, plasma, or other fluid based on the amount of fetal DNA. Such embodiments may be used in situations where a low or extremely low fetal fraction renders traditional aneuploidal risk methodologies inconclusive or inaccurate. For example, such embodiments may be used to determine the risk of trisomy 13, trisomy 18, or maternal triploidy, which are all aneuploidies associated with a low or extremely low fetal fraction. An embodiment operates by receiving genetic data for a target sample (sample of interest) of maternal blood, plasma, or other fluid, and known genetic data from a plurality of known noninvasive prenatal testing samples. A fetal fraction distribution is determined for the received known genetic data based on gestational age and the maternal weight associated with the target sample. A model for a plurality of ploidy states is then generated based on a fixed ratio reduction of the determined fetal fraction distribution. A fetal fraction based data likelihood for the target sample is then determined for each of the plurality of ploidy states using the generated model and the fetal fraction associated with the target sample. An aneuploidy risk score is then output for the target sample based on applying a Bayesian probability determination that combines each fetal fraction based data likelihood with a previously determined risk score as a conditional value. Accordingly, aneuploidy risk can be determined in a target sample even when the sample contains a low amount of fetal DNA. This allows aneuploidy risk to be determined even when SNP based noninvasive prenatal testing would be unreliable or unavailable.

The term “obtaining genetic data” as used herein refers to both, unless indicated otherwise by context, (1) acquiring DNA sequence information by laboratory techniques, e.g. use of an automated high throughput DNA sequencer, and (2) acquiring information that had been previously obtained by laboratory techniques, wherein the information is electronically transmitted to an analyzer, e.g. by computer over the Internet, by electronic transfer from the sequencing device, etc.

The term “aneuploidy” refers to the state where the wrong number of chromosomes are present in a cell. In the case of a somatic human cell it refers to the case where a cell does not contain 22 pairs of autosomal chromosomes and one pair of sex chromosomes. In the case of a human gamete, it refers to the case where a cell does not contain one of each of the 23 chromosomes. In the case of a single chromosome, it refers to the case where more or less than two homologous but nonidentical chromosomes are present, and where each of the two chromosomes originate from a different parent.

The term “ploidy state” refers to the quantity and chromosomal identity of one or more chromosomes in a cell.

Certain aneuploidies are often associated with a reduced amount of fetal DNA in the target sample. For example, trisomy 13, trisomy 18, and maternal triploidy are often associated with a reduced amount of fetal DNA in the target sample. Embodiments of the invention determine aneuploidy risk in a target maternal sample based on a relationship between the amount of fetal DNA in the target sample and the presence of certain aneuploidies.

FIG. 6 illustrates a system for performing embodiments of the present invention.

FIG. 6 includes an analysis system 602 for determining a risk of fetal aneuploidy. Analysis system 602 may include one or more processors for executing the functions described herein. Such functions may be implemented on the processor as engines or logical elements that perform the analytical functionality described herein, such as a modeling engine and a probability engine. Interaction of a user with such analytical engines may be conducted through an appropriate user interface. Analysis system 602 may be coupled to a database of known samples 604 via, for example, a network 610. Network 610 may be any type of communication network, including intranets, local area networks, or wide area networks such as the Internet. Genetic data from samples with a known ploidy state may be used to form a baseline for comparison with a target sample in question, as discussed further below. Database 604 may be a collection of data from a variety of sources including clinical studies and commercial data sets.

In an embodiment, a fetal fraction distribution is defined for such known genetic data from the plurality of known prenatal testing samples by analysis system 602. The fetal fraction distribution may be based on the maternal weight and the gestational age corresponding to each sample. This is because gestational age and maternal weight are often factors for the amount of fetal DNA present in a maternal blood sample. A fetal fraction distribution may be defined for known genetic data from a plurality of noninvasive known prenatal testing samples.

The plurality of known prenatal testing samples may be selected based on various criteria to ensure an accurate and representative fetal fraction distribution. In an embodiment, known genetic data for a known prenatal testing sample may be selected or filtered for inclusion in the fetal fraction distribution based on an associated low aneuploidy risk result, a no call result due to low fetal fraction, and a low confidence result. Known genetic data for a known prenatal testing sample may also be selected based on whether the maternal weight associated with the sample is available or whether the sample was collected in a clinical trial in the United States or a foreign country. A selection based on country of origin may be done to prevent unit conversion uncertainty in maternal weight for the sample.

In an embodiment, known genetic data for a plurality of known prenatal testing samples may be grouped into sets according to gestational age and maternal weight. In an embodiment, known genetic data for the plurality of known prenatal testing samples may include sample data taken at a gestational age ranging from 9 to 20 weeks at one week increments. Known genetic data for the plurality of known prenatal testing samples may also include sample data corresponding to a maternal weight ranging from 110 to 250 pounds at 20 pounds increments. In an embodiment, sampling of the known genetic data from known prenatal testing samples may be accurate to within plus or minus ten days of gestational age and plus or minus five pounds of maternal weight.

In an embodiment, the average fetal fraction is computed for the known genetic data in each set of known prenatal testing samples. The standard deviation may also be computed. In an embodiment, the average fetal fraction and standard deviation is only computed for sets of known prenatal testing samples containing at least 50 samples. This may be done to ensure an accurate and representative fetal fraction distribution. The result is a grid of distribution parameters (e.g. average fetal fractions and standard deviations) that correspond to the grid of sample conditions.

FIG. 1 illustrates an example fetal fraction distribution based on known genetic data from a plurality of known prenatal testing samples grouped according to gestational age and maternal weight, according to an example embodiment. In the example of FIG. 1, a set of known prenatal testing samples is grouped together based on the gestational ages of 9 weeks, 12 weeks, and 18 weeks. Moreover, each set of known prenatal testing samples is further grouped together based on maternal weight. The average fetal fraction is computed for the known genetic data of each resulting set of known prenatal testing samples.

For example, in FIG. 1, the average fetal fraction of prenatal testing samples with a maternal weight of 200 lbs. and a gestational age of 9 weeks is around 0.06. The average fetal fraction of prenatal testing samples with a maternal weight of 200 lbs. and a gestational age of 12 weeks is around 0.07. The average fetal fraction of prenatal testing samples with a maternal weight of 200 lbs. and a gestational age of 18 weeks is around 0.08.

Given a particular gestational age and fetal fraction, a fetal fraction distribution may become more symmetric when transformed to log space. Therefore, modeling of fetal fraction may be conducted in log space.

In an embodiment, the fetal fraction distribution may be transformed to a log-normal distribution. In other words, the fetal fraction distribution may transformed to a continuous probability distribution of the fetal fraction whose logarithm is normally distributed. Specifically, the logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the known genetic data of the known prenatal testing samples.

FIG. 2 illustrates an example log normal fetal fraction distribution based on the transformation of a fetal fraction distribution to log space, according to an example embodiment. In the example of FIG. 2, the logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the known prenatal testing samples.

In the example of FIG. 2, for known genetic data for around 800 known prenatal testing samples, the gestational age is 10 weeks plus or minus 10 days and the maternal weight is 230 pounds plus or minus 5 pounds. Thus, in FIG. 2, the log normal fetal fraction distribution represents a probability density function (PDF) that describes the relative likelihood for fetal fraction to take on a given value where the gestational age is around 10 weeks plus or minus 10 days and the maternal weight is 230 pounds plus or minus 5 pounds.

In an embodiment, the probability of having an aneuploidy can be computed from the log normal fetal fraction distribution. Specifically, the probability of having an aneuploidy can be computed as the integral of the PDF over a defined range.

In an embodiment, the effect of an aneuploidy may be modeled as a fixed rate reduction in the average fetal fraction compared to the expected average fetal fraction for a given maternal weight and gestational age. For example, the average fetal fraction of a trisomy 13 pregnancy may be 80% of the average fetal fraction for a euploid pregnancy of the same maternal weight and gestational age. Trisomy 13, trisomy 18, and maternal triploidy may be modeled using a fixed rate reduction in the average fetal fraction. As would be appreciated by a person of ordinary skill in the art, the effect of an aneuploidy may be modeled according to various other reductions in the average fetal fraction compared to the expected average fetal fraction for a given maternal weight and gestational age.

In an embodiment, a model may be generated for a plurality of ploidy states based on the fixed ratio reduction of the fetal fraction distribution. A ploidy state may be referred to as a hypothesis.

A fetal fraction distribution may be transformed to a log-normal distribution of fetal fraction prior to generation of a model. In an embodiment, a model may be generated for three hypotheses: trisomy 13, trisomy 18, and maternal triploidy.

In an embodiment for a log-normal distribution of fetal fraction, a fixed rate reduction in the average fetal fraction corresponds to a constant subtracted offset. Thus, for a pregnancy with a particular gestational age and maternal weight, the log fetal fraction for euploid prenatal testing samples is Gaussian distributed with a mean m and a standard deviation s, but the log fetal fraction for prenatal testing samples with an aneuploidy is Gaussian distributed with a mean m-c and a standard deviation s-c where c is a constant subtracted offset for a given aneuploidy. As would be appreciated by a person of ordinary skill in the art, a constant subtracted offset for a given aneuploidy may be determined by an analysis of empirical data.

In an embodiment, there may be a single constant subtracted offset for trisomies 13 and 18 and a different offset for maternal triploidy. In an embodiment, the constant subtracted offset for trisomies 13 and 18 is log(0.79). In other words, in this example, the mean for the trisomy 13 and 18 hypothesis distributions are reduced by log(0.79).

In an embodiment, the constant substracted offset for maternal triploidy is log(0.22). In other words, in this example, the mean of the maternal triploidy hypothesis distribution is reduced by log(0.22).

Returning to FIG. 6, analysis system 602 may also be coupled to a database 606 containing genetic data for a target sample, either directly or over network 610. Genetic data about the target sample, stored in database 606, may have been obtained from, for example, a sequencer 608. The target sample is one for which a fetal aneuploidy risk is to be determined. While the examples herein will refer to maternal blood, one of skill in the art will recognize that the target sample may be, for example, a maternal blood or plasma containing both maternal DNA and fetal DNA. Such DNA may be, for example, cell-free DNA. As would be appreciated by a person of ordinary skill in the art, a target maternal blood sample that contains fetal DNA may be obtained using various methods.

In some embodiments of the invention, the obtained prenatal target sample is modified using standard molecular biology techniques in order to be sequenced on a DNA sequencer, such as sequencer 608. In some embodiments, the technique will involve forming a genetic library containing priming sites for the DNA sequencing procedure. A plurality of loci may be targeted for site specific amplification. In some embodiments the targeted loci are polymorphic loci, e.g., a single nucleotide polymorphisms. In embodiments employing the formation of genetic libraries, libraries may be encoded using a DNA sequence that is specific for the patient, e.g. barcoding, thereby permitting multiple patients to be analyzed in a single flow cell (or flow cell equivalent) of a high throughput DNA sequencer. Although the samples are mixed together in the DNA sequencer flow cell, the determination of the sequence of the barcode permits identification of the patient source that contributed the DNA that had been sequenced.

Methods are known in the art for obtaining genetic data from a sample. Typically this involves amplification of DNA in the sample, a process which transforms a small amount of genetic material to a larger amount of genetic material that contains a similar set of genetic data. This can be done by a wide variety of methods, including, but not limited to, Polymerase Chain Reaction (PCR), ligand mediated PCR, degenerative oligonucleotide primer PCR, Multiple Displacement Amplification, allele-specific amplification techniques, Molecular Inversion Probes (MIP), padlock probes, other circularizing probes, and combination thereof. Many variants of the standard protocol can be used, for example increasing or decreasing the times of certain steps in the protocol, increasing or decreasing the temperature of certain steps, increasing or decreasing the amounts of various reagents, etc. The DNA amplification transforms the initial sample of DNA into a sample of DNA that is similar in the set of sequences, but of much greater quantity. In some cases, amplification may not be required.

The genetic data of the target sample can be transformed from a molecular state to an electronic state by measuring the appropriate genetic material using tools and or techniques taken from a group including, but not limited to: genotyping microarrays, and high throughput sequencing. Some high throughput sequencing methods and systems include Sanger DNA sequencing, pyrosequencing, the ILLUMINA SOLEXA platform, ILLUMINA's GENOME ANALYZER, ILLUMINA's HISEQ or MISEQ, APPLIED BIOSYSTEM's SOLiD platform, ION TORRENT'S PGM or PROTON platforms, HELICOS's TRUE SINGLE MOLECULE SEQUENCING platform, HALCYON MOLECULAR's electron microscope sequencing method, or any other sequencing method. All of these methods physically transform the genetic data stored in a sample of DNA into a set of genetic data that is typically stored in a memory device en route to being processed.

In an embodiment, a fetal fraction based data likelihood for a target sample may be computed by analysis system 602 for each ploidy state (e.g., trisomy 13, trisomy 18, and maternal triploidy) using the generated model and the fetal fraction associated with the target sample, where each ploidy state corresponds to a hypothesis. Specifically, a fetal fraction based data likelihood for a target sample may be computed for each hypothesis (e.g. trisomy 13, trisomy 18, maternal triploidy, etc.) by evaluating the Gaussian probability density function at the observed log value of the fetal fraction associated with the target sample at each of the three hypotheses.

FIG. 3A illustrates an example of a generated model for trisomy 13, trisomy 18, and maternal triploidy based on a fixed ratio reduction of a determined fetal fraction distribution, according to an embodiment. Specifically, FIG. 3A illustrates an example of a generated model for trisomy 13, trisomy 18, and maternal triploidy where the gestational age is 19 weeks and the maternal weight is 166 pounds. Thus, in FIG. 3A, a fetal fraction based data likelihood for a target sample with a gestational age of 19 weeks and a maternal weight of 166 pounds may be computed for trisomy 13, trisomy 18, and maternal triploidy by evaluating the respective Gaussian probability density function at the observed log value of the fetal fraction associated with the target sample.

For example, in FIG. 3A, the fetal fraction based data likelihood of trisomy 13 or trisomy 18 for a target sample with a fetal fraction of 0.10, a maternal weight of 166 pounds, and a gestational age of 19 weeks is around 35%. Similarly, the fetal fraction based data likelihood of trisomy 13 or trisomy 18 for a target sample with a fetal fraction of 0.20, a maternal weight of 166 pounds, and a gestational age of 19 weeks is around 10%.

FIG. 3B illustrates an example of a generated model for trisomy 13, trisomy 18, and maternal triploidy based on a fixed ratio reduction of a determined fetal fraction distribution, according to an embodiment. Specifically, FIG. 3B illustrates an example of a generated model for trisomy 13, trisomy 18, and maternal triploidy where the gestational age is 13 weeks and the maternal weight is 166 pounds. Thus, in FIG. 3B, a fetal fraction based data likelihood for a target sample with a gestational age of 13 weeks and a maternal weight of 166 pounds may be computed for trisomy 13, trisomy 18, and maternal triploidy by evaluating the respective Gaussian probability density function at the observed log value of the fetal fraction associated with the target sample.

By determining fetal fraction based data likelihoods for different ploidy states using a generated model for a target sample, an aneuploidy risk score for the fetus associated with the target sample may be determined. Specifically, in an embodiment, each fetal fraction based data likelihood can be combined with a previously determined risk score in order to determine the aneuploidy risk score for the fetus associated with the target sample. A previously determined risk score may be, for example, an age based prior risk score for the mother associated with the target sample. In another example, a previously determined risk score may be a SNP-based prior risk score. As would be appreciated by a person of ordinary skill in the art, a previously determined risk score may be based on other prior risk factors, including a combination of prior risk factors.

In an embodiment, an aneuploidy risk score for the fetus associated with the target sample may be determined based on the posterior probability of the presence of any of trisomy 13, trisomy 18, and maternal triploidy. Specifically, the fetal fraction based data likelihoods may be combined with previously determined risk scores for trisomy 13, trisomy 18, and maternal triploidy using Bayes' theorem to determine an aneuploidy risk score for the fetus associated with the target sample. In an embodiment, the previously determined risk scores for trisomy 13 and trisomy 18 depend on maternal age and gestational age and may be determined empirically. In an embodiment, the previously determined risk score for maternal triploidy is 1/5505.

FIG. 4 is a flowchart of a method 400 for determining aneuploidy risk in a target maternal blood sample, according to an embodiment. Method 400 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processing device), or a combination thereof. Such processing logic may be implemented in, for example, analysis system 602.

In step 402 of FIG. 4, known genetic data from a plurality of known noninvasive prenatal testing samples is received. As would be appreciated by a person of ordinary skill in the art, the known genetic data from the plurality of known prenatal testing samples may be received from a variety of sources including clinical studies and commercial data sets. Moreover, as would be appreciated by a person of ordinary skill in the art, a fetal fraction distribution may be defined for known genetic data from a plurality of noninvasive known prenatal testing samples, a plurality of invasive known prenatal testing samples, or a combination of both.

The received known genetic data from the plurality of known prenatal testing samples may be optionally filtered based on various criteria to ensure that an accurate and representative fetal fraction distribution is determined in step 406. In an embodiment, known genetic data for the known prenatal testing samples may be filtered based on an associated low aneuploidy risk result, a no call result due to low fetal fraction, and a low confidence result. The received known genetic data for the known prenatal testing samples may also be filtered based on whether the maternal weight associated with a sample is available or whether a sample was collected in a clinical in the United States or a foreign country. The filtering based on country of origin may be done to prevent unit conversion uncertainty in maternal weight for a sample.

In step 404 of FIG. 4, genetic data for a target maternal blood sample containing fetal DNA is received. The genetic data includes at least gestational age of the associated fetus, a maternal weight, and a fetal DNA fraction of the target sample. As would be appreciated by a person of ordinary skill in the art, a target maternal blood sample that contains fetal DNA may be obtained using various methods.

In step 406 of FIG. 4, a fetal fraction distribution is determined for the known genetic data from step 402. The determined fetal fraction distribution is based on the maternal weight and the gestational age associated with the target blood sample of step 404. In other words, the received known genetic data for the plurality of known prenatal testing samples is grouped into sets according to gestational age and maternal weight. As discussed above, the sampling of the known genetic data from known prenatal testing samples may be done at various intervals for gestational age and maternal weight.

For each set of known prenatal testing samples, the average fetal fraction is then computed. In an embodiment, the average fetal fraction may only be computed where a set of known prenatal testing samples includes a minimum number of 50 samples. This may be done to ensure an accurate and representative fetal fraction distribution.

In step 408 of FIG. 4, the fetal fraction distribution is transformed to a log-normal distribution. In an embodiment, the logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the received known genetic data of step 402. As would be appreciated by a person of ordinary skill in the art, the log normal fetal fraction distribution represents a PDF that describes the relative likelihood for fetal fraction to take on a given value where the gestational age is equal to gestational age and the maternal weight associated with the received genetic data for the target sample of step 404.

In step 410 of FIG. 4, a model is generated for a plurality of ploidy states based on the log-normal distribution of fetal fraction of step 408. In an embodiment, trisomy 13, trisomy 18, and maternal triploidy distributions are generated from the log-normal distribution of fetal fraction of step 408. This involves reducing the mean for the trisomy 13, trisomy 18, and maternal triploidy distributions by respective constant subtracted offset. As would be appreciated by a person of ordinary skill in the art, the constant subtracted offsets for the trisomy 13, trisomy 18, and maternal triploidy distributions may be determined experimentally.

In step 412 of FIG. 4, fetal fraction based data likelihoods for the received target sample of step 404 are computed for each of the ploidy states using the generated model of step 410 and the fetal fraction associated with the target sample. In an embodiment, a fetal fraction based data likelihood for the received target sample is computed for trisomy 13, trisomy 18, and maternal triploidy by evaluating the Gaussian probability density functions for trisomy 13, trisomy 18, and maternal triploidy at the observed log value of the fetal fraction associated with the target sample.

In step 414 of FIG. 4, a Bayesian probability determination is applied to combine the fetal fraction based data likelihoods of step 412 with previously determined risk scores. As would be appreciated by a person of ordinary skill in the art, a previously determined risk score may be an age based prior risk score for the mother associated with the target sample or an SNP-based prior risk score.

In step 416 of FIG. 4, aneuploidy risk scores for trisomy 13, trisomy 18, and maternal triploidy are output based on the applying in step 414. As would be appreciated by a person of ordinary skill in the art, the outputting may be performed using various methods and mediums.

In an embodiment, the aneuploidy risks scores for trisomy 13, trisomy 18, and maternal triploidy are independently determined. Because each aneuploidy risk score is an independent posterior probability of the presence of either trisomy 13, trisomy 18, or maternal triploidy, the resulting aneuploidy risk scores can be compared to identify the most likely ploidy state.

In an embodiment, a probability that the sample is euploid is also determined and taken into account.

In this manner, an additional type of analysis is made available to individuals whose aneuploidy risk may not be able to be determined by traditional methods, such as SNP-based methods. This analysis may also be used to confirm a previously determined risk score in situations where extremely low fetal fraction is an issue.

FIG. 7 illustrates a posterior fetal fraction risk distribution, according to an example embodiment. In the example of FIG. 7, a posterior risk distribution is computed by combining data likelihoods with prior risk for a gestational age between 9 and 11 weeks. The cutoff is at 1/100 risk. This sets the fetal fraction limit for a high risk call.

FIG. 8 illustrates a result set for a pilot study of an example embodiment for fetal fraction-based high risk assessment that predicts an aneuploidy in cases with low fetal fraction. The result set of FIG. 8 indicates that the example embodiment for fetal fraction-based risk assessment is able to predict abnormalities in a clinical data sample set. Specifically, in the example of FIG. 8, there were 143 cases with high risk, low fetal fraction. 70 cases were with karyotype. There was a 10% positive predictive value (PPV) if the associated clinical sample data set was restricted to cases with karyotype and a 4.9% PPV if missing karyotypes were assumed unaffected. FIG. 8 illustrates some of the abnormalities detected in the pilot study.

FIG. 9 illustrates a redraw success rate distribution, according to an example embodiment. FIG. 9 shows fetal fraction change observed from approximately 3,000 Non-Invasive Prenatal Testing (NIPT) redraws. The example embodiment of FIG. 9 provides useful information when an embodiment for NIPT single-nucleotide polymorphism (SNP) fails to provide a prediction. Specifically, the example embodiment of FIG. 9 provides a fetal fraction-based risk score and a probability of successful call on redraw, making it possible to predict redraw success based on a predicted range of redraw fetal fraction.

FIG. 10 illustrates a distribution of fetal fraction based risk scores in cases identified as high risk and low fetal fraction, according to follow up study of an example embodiment. For example, FIG. 10 shows that roughly 5 cases had a fetal fraction based risk score of 0.2. In the follow-up study, the objective was to test whether high fetal fraction-based risk predicts aneuploidy in cases with unusually low fetal fraction. An attempt to collect follow up was made for 896 samples, where the adjusted fetal fraction was below approximately the 2^(nd) percentile, and the maternal weight was available. 525 samples were eligible for inclusion in the follow up study, from domestic clinics and direct sales clinics. 143 samples were identified as having high fetal fraction-based risk with low fetal fraction. In particular, the fetal fraction-based risk was greater than 0.01 and the fetal fraction was 2.5 SD below mean. Karyotype was available for 70 samples.

FIG. 11A illustrates an estimated detection rate for trisomy 13 and 18, according to an example embodiment. Specifically, FIG. 11A illustrates what fraction of affected cases that are not identified by a NIPT SNP embodiment will be identified by the fetal fraction-based risk score >1/100. The estimated detection rate is based on the sample data set of FIG. 10. In FIG. 11A, the estimated detection rate for trisomy 13/18 is 91.4%.

FIG. 11B illustrates an estimated detection rate for digynic triploidy, according to an example embodiment. Specifically, FIG. 11B illustrates what fraction of affected cases that are not identified by a NIPT SNP embodiment will be identified by the fetal fraction-based risk score >1/100. The estimated detection rate is based on the sample data set of FIG. 10. In FIG. 11B, the estimated detection rate for digynic triploidy is 96.6%. Retroactive application of such high risk fetal fraction criteria to 29,000 NIPT cases would have resulted in 432 high risk calls (1.5%). Application of the SNP method would result in 115 (0.4%) high risk calls (for T13, T18, digynic triploidy). This results in a 1.8% combined high risk call rate. The expected aneuploidy rate based on priors was 0.3%. The theoretical PPV was thus 16% (0.3%/1.8%).

FIG. 12 illustrates a PDF of normalized euploid data, according to an example embodiment. Specifically, FIG. 12 shows empirical density plots of fetal fractions after normalization. There are 39 density curves. Each of the 39 density curves comes from a set of data with approximately the same maternal weight and gestational age, with between 400 and 500 samples each. Each data set is normalized by its observed mean and variance. The plot in FIG. 12 shows that the Gaussian fit is appropriate because the distributions are very similar.

FIG. 13 illustrates a CDF of the normalized euploid data of FIG. 12, according to an example embodiment. Specifically, FIG. 13 shows empirical density plots of fetal fractions after normalization. There are 39 density curves. Each of the 39 density curves comes from a set of data with approximately the same maternal weight and gestational age, with between 400 and 500 samples each. Each data set is normalized by its observed mean and variance. The plot in FIG. 13 shows that the Gaussian fit is appropriate because the distributions are very similar.

FIG. 14 illustrates a plot of redraw success rate, according to an example embodiment. Specifically, FIG. 14 plots the redraw success rate against material weight bucket center. This plots shows that another characteristic of the fetal fraction distribution is the redraw success rate. Specifically, the ability to make a call is strongly dependent on fetal fraction and a successful redraw is often based on an increase in fetal fraction between the first and second draw. The ability to predict the probability of success for a redraw is often useful for doctors and patients. This is because many cases with low fetal fraction will not be at high risk for aneuploidy, but still have low probability of a successful redraw, and so other testing embodiments may be preferred.

FIG. 15 illustrates an example result set for identified high risk samples, according to an embodiment. Specifically, FIG. 15 illustrates a result set for 143 sample cases that were identified as having high extremely low fetal fraction (ELFF) risk based on not having received a successful high or low risk draw call, and having a computed ELFF risk score greater than 0.01. FIG. 15 further illustrates that follow-up results were successfully collected for 70 of these sample cases. Of these 70 sample cases, 7 were found to be aneuploid.

FIG. 15 shows that among the cohort with successful follow-up, the positive predictive value of high ELFF risk is 7/58=12.07%. FIG. 15 further shows that assuming all cases without follow-up are euploid, the positive predictive value is 7/113=6.19%. This value can be considered the lower bound PPV based on the data set of FIG. 15.

Various embodiments can be implemented, for example, using one or more well-known computer systems, such as computer system 500 shown in FIG. 5. Computer system 500 can be any well-known computer capable of performing the functions described herein.

Computer system 5 includes one or more processors (also called central processing units, or CPUs), such as a processor 5. Processor 504 is connected to a communication infrastructure or bus 506.

One or more processors 504 may each be a graphics processing unit (GPU). In an embodiment, a GPU is a processor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.

Computer system 500 also includes user input/output device(s) 503, such as monitors, keyboards, pointing devices, etc., that communicate with communication infrastructure 506 through user input/output interface(s) 502.

Computer system 500 also includes a main or primary memory 508, such as random access memory (RAM). Main memory 508 may include one or more levels of cache. Main memory 508 has stored therein control logic (i.e., computer software) and/or data.

Computer system 500 may also include one or more secondary storage devices or memory 510. Secondary memory 510 may include, for example, a hard disk drive 512 and/or a removable storage device or drive 514. Removable storage drive 514 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.

Removable storage drive 514 may interact with a removable storage unit 518. Removable storage unit 518 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 518 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/ any other computer data storage device. Removable storage drive 514 reads from and/or writes to removable storage unit 518 in a well-known manner.

According to an exemplary embodiment, secondary memory 510 may include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 500. Such means, instrumentalities or other approaches may include, for example, a removable storage unit 522 and an interface 520. Examples of the removable storage unit 522 and the interface 520 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.

Computer system 500 may further include a communication or network interface 524. Communication interface 524 enables computer system 500 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 528). For example, communication interface 524 may allow computer system 500 to communicate with remote devices 528 over communications path 526, which may be wired and/or wireless, and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 500 via communication path 526.

In an embodiment, a tangible apparatus or article of manufacture comprising a tangible computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 500, main memory 508, secondary memory 510, and removable storage units 518 and 522, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 500), causes such data processing devices to operate as described herein.

Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use embodiments of the invention using data processing devices, computer systems and/or computer architectures other than that shown in FIG. 5. In particular, embodiments may operate with software, hardware, and/or operating system implementations other than those described herein.

It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections (if any), is intended to be used to interpret the claims. The Summary and Abstract sections (if any) may set forth one or more but not all exemplary embodiments of the invention as contemplated by the inventor(s), and thus, are not intended to limit the invention or the appended claims in any way.

While the invention has been described herein with reference to exemplary embodiments for exemplary fields and applications, it should be understood that the invention is not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of the invention. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.

Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments may perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.

References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein.

The breadth and scope of the invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. 

What is claimed is:
 1. A method for determining aneuploidy risk in a target sample, comprising: receiving known genetic data from a plurality of known noninvasive prenatal testing samples; receiving genetic data for the target sample, the genetic data including a gestational age, a maternal weight, and a fetal fraction associated with the target sample; determining a fetal fraction distribution for the received known genetic data based on the gestational age and the maternal weight associated with the target sample; generating a model for a plurality of ploidy states based on a fixed ratio reduction of the determined fetal fraction distribution compared to an expected average fetal fraction for the gestational age and the maternal weight associated with the target sample; determining a fetal fraction based data likelihood for the target sample for each of the plurality of ploidy states using the generated model and the fetal fraction associated with the target sample; applying a Bayesian probability determination to combine each fetal fraction based data likelihood with a previously determined risk score as a conditional value; and outputting an aneuploidy risk score for the target sample based on the applying.
 2. The method of claim 1, wherein the previously determined risk score is a SNP based risk score.
 3. The method of claim 1, further comprising: transforming the determined fetal fraction distribution to logarithm space, wherein a logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the known prenatal testing samples.
 4. The method of claim 1, wherein determining a fetal fraction based data likelihood for the target sample comprises computing an integral of a probability density function of the generated model.
 5. The method of claim 1, wherein the generated model is associated with trisomy
 13. 6. The method of claim 1, wherein the generated model is associated with trisomy
 18. 7. The method of claim 1, wherein the generated model is associated with maternal triploidy.
 8. The method of claim 1, wherein determining a fetal fraction distribution for the received known genetic data comprises: grouping the genetic data for the plurality of known prenatal testing samples into sets according to gestational age and maternal weight; and generating a grid of distribution parameters corresponding to each set, wherein the distribution parameters include average fetal fraction and standard deviation.
 9. A system for determining aneuploidy risk in a target sample, comprising: means for receiving known genetic data from a plurality of known noninvasive prenatal testing samples; means for receiving genetic data for the target sample, the genetic data including a gestational age, a maternal weight, and a fetal fraction associated with the target sample; means for determining a fetal fraction distribution for the received known genetic data based on the gestational age and the maternal weight associated with the target sample; means for generating a model for a plurality of ploidy states based on a fixed ratio reduction of the determined fetal fraction distribution compared to an expected average fetal fraction for the gestational age and the maternal weight associated with the target sample; means for determining a fetal fraction based data likelihood for the target sample for each of the plurality of ploidy states using the generated model and the fetal fraction associated with the target sample; means for applying a Bayesian probability determination to combine each fetal fraction based data likelihood with a previously determined risk score as a conditional value; and means for outputting an aneuploidy risk score for the target sample based on the applying.
 10. The method of claim 9, wherein the previously determined risk score is a SNP based risk score.
 11. The method of claim 9, further comprising: means for transforming the determined fetal fraction distribution to logarithm space, wherein a logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the known prenatal testing samples.
 12. The method of claim 9, wherein the means for determining a fetal fraction based data likelihood for the target sample comprises means for computing an integral of a probability density function of the generated model.
 13. The method of claim 9, wherein the generated model is associated with trisomy
 13. 14. The method of claim 9, wherein the generated model is associated with trisomy
 18. 15. The method of claim 9, wherein the generated model is associated with maternal triploidy.
 16. The method of claim 9, wherein the means for determining a fetal fraction distribution for the received known genetic data comprises: means for grouping the genetic data for the plurality of known prenatal testing samples into sets according to gestational age and maternal weight; and means for generating a grid of distribution parameters corresponding to each set, wherein the distribution parameters include average fetal fraction and standard deviation.
 17. A system for determining aneuploidy risk in a target sample, comprising: a known testing samples database containing known genetic data from a plurality of known noninvasive prenatal testing samples; a target sample database containing genetic data for at least the target sample, the genetic data including a gestational age, a maternal weight, and a fetal fraction associated with the target sample; an aneuploidy risk analysis system in communication with the known testing samples database and the target sample database, the aneuploidy risk analysis system comprises: a logical element configured to determine a fetal fraction distribution for the received known genetic data based on the gestational age and the maternal weight associated with the target sample; a modeling engine configured to generate a model for a plurality of ploidy states based on a fixed ratio reduction of the determined fetal fraction distribution compared to an expected average fetal fraction for the gestational age and the maternal weight associated with the target sample; and a probability engine configured to determine a fetal fraction based data likelihood for the target sample for each of the plurality of ploidy states using the generated model and the fetal fraction associated with the target sample, apply a Bayesian probability determination to combine each fetal fraction based data likelihood with a previously determined risk score as a conditional value, and output an aneuploidy risk score for the target sample based on the Bayesian probability determination.
 18. The system of claim 17, wherein the previously determined risk score is a SNP based risk score.
 19. The system of claim 17, wherein the modeling engine is further configured to transform the determined fetal fraction distribution to logarithm space, wherein a logarithm of the fetal fraction is assumed Gaussian distributed with a mean and standard deviation that are a function of gestational age and maternal weight for the known prenatal testing samples.
 20. The system of claim 17, wherein the logic element is configured to determine a fetal fraction based data likelihood for the target sample by computing an integral of a probability density function of the generated model.
 21. The system of claim 17, wherein the generated model is associated with trisomy
 13. 22. The system of claim 17, wherein the generated model is associated with trisomy
 18. 23. The system of claim 17, wherein the generated model is associated with maternal triploidy. The system of claim 17, wherein the probability engine is configured to determine a fetal fraction distribution for the received known genetic data by grouping the genetic data for the plurality of known prenatal testing samples into sets according to gestational age and maternal weight, and generating a grid of distribution parameters corresponding to each set, wherein the distribution parameters include average fetal fraction and standard deviation.
 24. The system of claim 17, further comprising a DNA sequencer in communication with the target sample database and configured to supply genetic data about the target sample. 